Unsupervised induction of user intents from conversational customer service corpora

ABSTRACT

A methodology and system are presented for inducing user intent in a corpus and storing this intent in an intent library. To accurately detect intent, the corpus is first cleaned of nonsensical words and symbols and then syntactically analyzed to extract words and dependencies between them, which are then semantically analyzed to select keywords that are indicative of intent, and map the keywords to ordered broad semantic categories of the types of action, modifier and object. Keywords are then converted into embedding vectors whose dimensions are reduced and clustered according to category and order. Relations are calculated for the clusters across the semantic categories and intent is then calculated with the help of intent templates and word dictionaries.

FIELD

The present application relates to systems, devices, apparatuses andmethods of analyzing dialogue. More particularly, the applicationrelates to determining user intent from conversational dialogue.

BACKGROUND

Humans have developed very complex linguistic and mental skills duringtheir evolution. Such skills are routinely used when interacting withone another and more recently with computer-based systems. One mayconsider a human asking a simple question like “what is my accountbalance” to a bank clerk or via a phone banking system to a humanoperator, or a computer. The user may use his voice, while inalternative scenarios he may type his question in a graphical box of achat-based interface. There are numerous ways the same (or other) humanmay ask the same question, like “I would like to know my credit accountbalance please”, “what's my balance”, “how much money is still in mydeposit account”, etc. These variations can grow exponentially if, forexample, a user has different types of accounts and/or if his inputcontains words that are not directly related to his request or aresemantically empty (e.g. “em . . . ”, “hm . . . ”, “well . . . ”).

Detecting the user's intent, i.e. “to find out his account balance”, iscomplicated enough for a human operator, as he would first have toidentify the useful part of the user's utterance and then try to makesense out of it (e.g. linguistically and/or by combining context andother related information on the accounts the user possesses etc.), soas to service the user's request.

This operation has a complexity many orders higher when serviced by acomputer-based system, as the system does not possess the intelligenceof the average human. In recent years, significant developments havebeen made in automated speech processing and text analysis and lately,methods have been proposed to use such analysis to identify user intent.Such methods are based on modeling natural language using statisticaland other mathematical methods. They typically involve human supervisionin at least some of their method steps like, for instance, datasetlabeling for training algorithms for intent classification.

Automatic intent induction systems that require labeled datasets aretailored to the specific needs of narrowly defined use cases and domains(e.g. banking or retail), while outside such pre-defined use, systemefficacy, accuracy and speed of operation are seriously hampered. As aresult, when the domain changes, such systems either need seriousparameterization involving heavy human intervention or their performanceand output is of no practical value. In addition, labeled datasets limitthe use of such systems to a specific language, which furthercomplicates the situation.

It is apparent from the above limitations that an accurate, efficient,and scalable method is needed to automatically identify user intent inunconstrained contexts. Such method can make use of automaticallycompiled libraries of user intents, available for use in real timeintent detection, i.e. during user interactions with computer-basedsystems without altering the usual user routine when verbally ortextually interacting with such systems. There is, therefore, a need toautomatically build and/or update user intent libraries.

SUMMARY

The present application relates to systems, devices, apparatuses andmethods of automatically inducing user intent in unstructuredconversational (dialogue) corpora. The application leverages varioustechniques within the fields of speech processing, natural languageprocessing, artificial intelligence, and machine learning. Moreprecisely, the application relies on the combined use of grammaticalknowledge (acquired from syntactic parsing models) and lexical knowledge(acquired from distributional semantics models referred to as vectorspace models) to cluster user utterances in coherent intent groups andinduce explicit descriptions of the semantic components of intents. Thepresent application includes an innovative solution aimed at creatingand updating intent libraries for use in identification of the intent ofa user interacting with a human agent or a computer system. Beforedetecting the user's intent in a corpus, the present solution assumesthat speech is converted to text, if the user interacts in utteredspeech.

The corpus is preprocessed using language models and/or worddictionaries to remove words and symbols with no linguistic value. Asentence segmentation model identifies sentence boundaries in the cleancorpus, which is subsequently analyzed with a syntactic model. Thelatter identifies binary relations (dependencies) between words, on topof part-of-speech tags.

Semantic analysis follows to select keywords that convey user's intent,and map the keywords to semantic categories, or keyword types (Actions,Modifiers, Objects). The dependencies between keywords are projected todependencies between keyword types and the latter are combined inso-called AMO triplets that are used to represent the meaning of eachcorpus sentence. AMO triplets are populated with keywords while themodel keeps track of the dependencies between them. In certainimplementations, one user intent is semantically equivalent to at leastone AMO triplet.

Keywords are then converted into embeddings vectors and the vectordimensions are reduced before the vectors are clustered. Clustering ofkeyword vectors takes place inside each semantic category (keyword type)at each AMO level and lists of semantically related words (i.e. keyworddictionaries) are output. Keyword relations (dependencies) are projectedto cluster relations inside and across AMO levels. The clusters andtheir relations are used to create intent templates that are equivalentto semantic descriptions of intents. Empty slots in the templates arefilled with lexical entries from the automatically acquired keyworddictionaries.

In a variation of the above methodology, sentence embeddings arecalculated from keyword embeddings using one of a set of proposedmethods. The sentence embeddings are then clustered in coherent groups,which also represent intents.

In yet another exemplary implementation, sentence clusters are used tovalidate the intent semantic structure produced from keyword clustersand their relations and, therefore, increase accuracy and performance ofthe calculation method resulting in improved intent libraries.

The intents in the updated intent library are then made available foruser intent induction during the user's interaction with any third partysystem. To facilitate this interaction, the identified user intent ismapped onto one or more actions, which are sent to the third partysystem or application.

In one aspect, a system for updating an intent library includes asyntactic parser arranged to process a sequence of word tokens andcontrol characters of at least one sentence in a corpus and producewords and dependencies between the words. The system also includes asemantic analyzer arranged to process the words and dependencies betweenthe words for extracting a set of keywords and arranged to map thekeywords to action (A), modifier (M) and object (O) semantic categoriesand create ordered AMO triplets. The system further includes anembeddings processor arranged to convert the extracted keywords in theordered AMO triplets into keyword embedding vectors and reduce thedimensions of the keyword embedding vectors in each of the action,modifier and object semantic category and in each order of the AMOtriplets. The system includes a clustering processor arranged to clusterthe reduced dimension keyword embedding vectors, where each keywordcluster contains semantically similar keywords, and which keywords in acluster express a single intent. System also includes an intentcalculator arranged to calculate cluster relations, create intenttemplates, fill empty positions in the intent templates, and store theintent clusters and the intents the clusters represent to the intentlibrary.

In some configurations, the system includes a pre-processor arranged toeliminate words and marks that have no linguistic value from a corpus,and arranged to create a sequence of word tokens and pairs of sentenceboundary control characters, where the corpus comprises at least onesentence. The intent calculator may be arranged to validate the intentsemantic structure. The intent calculator may be configured to assignintent labels to intent clusters and store the intent labels to theintent library.

In some implementations, any one of, portion of, or grouping of thepre-processor, the syntactic parser, the semantic analyzer, theembeddings processor, the clustering processor, or the intent calculatormay be implemented in one of an application server, a user device, amulti-processor system, a multicore-processor, and a multi-processorsystem where each processor is a multi-core processor. The system mayinclude an action processor arranged to map each intent onto one or moreactions and output each actions to at least one external system.

Another aspect includes a server configured to cluster keywords. Theserver includes a communications interface arranged to received textfrom at least one of an automated speech recognition (ASR) module and auser interface, the text forming at least one sentence in a corpus. Theserver also includes a processor arranged to: syntactically parse asequence of word tokens and control characters of the at least onesentence in the corpus to produce words and dependencies between thewords; semantically analyze the words and dependencies between the wordsfor extracting a set of keywords and means to map the keywords to action(A), modifier (M) and object (O) semantic categories and create orderedAMO triplets; convert the extracted keywords in the ordered AMO tripletsinto keyword embedding vectors and reduce the dimensions of the keywordembedding vectors in each of the action, modifier and object semanticcategory and in each order of the AMO triplets; and cluster the reduceddimension keyword embedding vectors, where each keyword cluster containssemantically similar keywords, and which keywords in a cluster express asingle intent.

In some implementations, the processor is configured to: i) eliminatewords and marks that have no linguistic value from the corpus, and ii)create the sequence of word tokens and pairs of sentence boundarycontrol characters. In some implementations, the processor is configuredto calculate cluster relations, create intent templates, fill emptypositions in the intent templates, and store the intent clusters and theintents the clusters represent to an intent library. The processor mayalso be configured to assign intent labels to intent clusters, whichlabels are found in the intent library and store the intent labels tothe intent library.

In a further aspect, a computer implemented method for updating anintent library includes a portion of or all of the following steps:pre-processing a corpus to eliminate words and symbols that have nolinguistic value, where the corpus comprises at least one sentence, andto create a sequence of word tokens and pairs of sentence boundarycontrol characters; syntactically processing the sequence of tokens toproduce a grammatical-syntactical representation of the at least onesentence in the corpus; semantically processing thegrammatical/syntactical representation of the at least one sentence inthe corpus to extract a set of keywords; mapping each extracted keywordto one of action (A), modifier (M) and object (O) semantic category;representing the order of appearance of the extracted keywords asdifferent levels of actions(A), modifiers(M) and objects(O); calculatingbinary relations between the extracted keywords; combining andprioritizing the binary relations into ordered AMO triplets, where eachAMO triplet describes one intent and contains at least one keyword;converting the extracted keywords in the ordered AMO triplets intokeyword embedding vectors; mapping the extracted keywords in the orderedAMO triplets onto an embedding space, where each keyword is converted toan n-dimensional embedding vector; reducing the dimensions of thekeyword embedding vectors in each of the action, modifier and objectsemantic category and in each of the ordered AMO triplets; clusteringthe keyword embedding vectors, where each cluster contains semanticallysimilar keywords; creating cluster combinations, where each clusterscombination represents a single intent; and entering the clustercombinations into the intent library.

With respect to patent eligibility, the above aspects should not beconsidered directed to an abstract idea. Instead, the above aspectsshould be considered directed to an Internet-centric problem orimprovement of computer technology related to more efficient automaticdeterminations of user intent from conversation dialogues thatadvantageously reduces memory and processing demands on corpora analysissystem. By converting extracted keywords from a corpora into ordered AMOtriplets to create keyword clusters that contain semantically similarkeywords, where each keyword cluster expresses a single intent, acorpora analysis system is able to more efficiently infer, induce,and/or determine a user's intent from their conversational dialogue.While the above aspects could involve abstract ideas, the inventiveconcepts are not directed to such ideas standing alone. A long-standingproblem with corpora analysis systems is how to quickly, efficiently,and reliably determine the intent of the author of a conversationaldialogue (corpora). The above aspects are directed to technicallyimproving the speed, efficiency, and reliability, while reducing thecost in processing and memory of determining user intent fromconversational dialogue.

Even if additional features of the above aspects, when viewedindividually, are considered generic computer and networking functions,an inventive concept exists because of the unconventional andnon-generic combination of known elements, including converting theextracted keywords from a corpora into ordered AMO triplets to createkeyword clusters that contain semantically similar keywords, where eachkeyword cluster expresses a single intent, enabling more efficient andreliable determinations of a user's intent. Furthermore, the variousfeatures and limitations of the above aspects should confine anyabstract ideas to a particular and practical application of thoseabstract ideas such that the combination of features is not awell-understood, routine or conventional activity. The above commentsshould apply to any other aspects described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a flowchart of a methodology for automatically processinguser utterances to induce intents and create or populate user intentlibraries.

FIG. 2A shows an example of a graphical representation of a syntacticstructure.

FIG. 2B shows another example of a graphical representation of asyntactic structure.

FIG. 3 shows a flowchart of a methodology for calculating multi-levelcluster relations.

FIG. 4 shows an example extract of clusters for the United Statesbanking domain for 100,000 utterances not readily associated with anexisting user intent.

FIG. 5 shows an example set of AMO clusters and the application of anAction Filter.

FIG. 6 shows how an intent library interfaces with external systems forintent induction and with systems that perform an action based on theinduced intent.

FIG. 7 shows a hardware diagram for an intent induction and actionsystem 500.

FIG. 8 shows an example implementation of an intent induction system andits connections to enable the flow of data.

FIG. 9A shows the hardware architecture of an application server orother hardware implementing the intent induction system.

FIG. 9B shows a system for intent induction using multiple processors.

FIG. 9C shows a system for intent induction using multiple processingcores.

FIG. 10 shows the basic software components 700 running on anapplication server.

FIG. 11 shows the main Software Components of a device.

DETAILED DESCRIPTION

The word “exemplary” is used herein to mean “serving as an example,instance, or illustration”. Any implementation described herein as“exemplary” is not necessarily to be construed as preferred oradvantageous over other implementations.

The acronym “IVR” is intended to mean “Interactive Voice Response”.

The acronym “NLU” is intended to mean “Natural Language Understanding”.

The acronym “ASR” is intended to mean “Automatic Speech Recognition”.

The acronym “DM” is intended to mean “Dialogue Manager”.

The acronym “PSTN” is intended to mean “Public Switched TelephoneNetwork”.

The acronym “PLMN” is intended to mean “Public Land Mobile Network”.

The acronym “VAD” is intended to mean “Voice Activity Detector”.

The acronym “UI” is intended to mean “User Interface”.

The acronym “OS” is intended to mean “Operating System”.

The term “mobile device” may be used interchangeably with “clientdevice” and “device with wireless capabilities”.

The term “user” may be used interchangeably with “regular user” and“ordinary user” and “speaker”. It may also be used to mean “caller” in atelephone or VOIP call or conferencing session, “user of an application”or “user of a service”, and “participant” in a text chat, audio chat,video chat, email, audio-conference or video-conference session.

The term “system” may be used interchangeably with “device”,“apparatus”, and “service”, except where it is obvious to a reader ofordinary skill in related art that these terms refer to differentthings, as this is apparent by the context of the discussion in whichthey appear. Under any circumstance, and unless otherwise explicitlystated or implicitly hinted at in the description, these four termsshould be considered to have the broadest meaning i.e. that ofencompassing all four.

The present invention addresses a technical problem of automaticallyinducing user intent libraries from unlabeled dialogue data. An intentlibrary includes semantically homogeneous groups of user utterances,i.e. surface linguistic expressions that may be uttered or typed byusers when interacting with a human operator or a computer system. Eachof these groups implicitly captures the semantics of a user intent andcan optionally be mapped to an explicit semantic description of theintent, i.e., a description of the semantic components of the intent.That is, each intent description clusters words and phrases conveyingsimilar meanings in a single (common) semantic abstraction.

In some implementations, the present invention offers a solution formore accurate, faster, domain agnostic, automatic creation of userintent libraries, which can be used to accurately and efficiently induceuser intent in real time.

One solution proposed by the present systems and methods involves no orminimal human intervention, while being scalable (with respect to thelanguages and the domains it supports) and cost-efficient to operate.Furthermore, the use of the systems and methods disclosed hereindoes notalter the usual user routine when verbally or textually interacting witha computer-based system. A user routine may include, for example, theuser speaking (or typing) in natural language, potentially using jargon,non-useful words like “ehm” etc., or engaging in a natural languageconversation with a human, or computer system, without having to utter apredefined text like a training text. In other words, a typical userroutine is intuitive to (most) users and this routine is not interruptedor altered by the use of the proposed innovative solution.

The present systems and methods can be used in a variety of businessdomains involving customer service/support applications (e.g. banking,e-commerce, telecommunications, healthcare, etc.) and can be integratedwith a variety of systems like, for example, voice recognition andprocessing systems, Automatic Speech Recognition (ASR), InteractiveVoice Response (IVR) systems, Dialogue Management (DM) systems,text-based DMs, automated customer support systems, search engines, textprocessing systems, user interaction systems, and any systems usingvoice or text interaction to service a user request and perform anaction (e.g. for data access and processing, control of an externalsystem, etc.).

Intent Induction

FIG. 1 shows a high-level flowchart of a methodology or process forautomatically processing user utterances to induce intents and create orpopulate user intent libraries. Methodology 100 starts with a corpus ofwritten or spoken user utterances in the context of a dialogue. Suchcorpus may include text typed by the user in a chat box or text producedby processing user's speech via an ASR system. In the former case, thecorpus may also contain punctuation marks, emoticons, and specialcharacters. In the latter case, the corpus may also contain ASR tagscreated during the conversion of speech to text by an ASR system ormodule.

The methodology includes the following steps, starting by usinglinguistic knowledge to structure raw corpus data.

A user's utterance includes words that are semantically contentful andothers that relate to non-linguistic aspects of communication. Thelatter usually contain no useful information for detecting the user'sintent. Consider two versions of an example utterance acquired (a) froma chat bot corpus, (b) from an ASR-transcribed corpus:

-   (a) hey, my name is Daniel Howard    need help plz . . . wanna know my last balance . . . & can you guys    help me pay my bill online plz-   (b) hey _COU_my name is daniel howard _SP_need help please _SP_wanna    know my last balance _HES_and can you guys help me pay my bill    online BRE_please    The ASR tags in (b) mark the following non-linguistic information:-   _COU_: cough-   _SP_: short pause-   _HES_: hesitation-   _BRE_: breathing

A pre-processing step 110 is used to remove tokens from the corpus thatadd “noise” to the intent induction task. In an exemplaryimplementation, this can be achieved using word dictionaries andlanguage models 111 developed for a single spoken language, as well assimple heuristics with the addition of terminology and jargon that maybe used by the user or the operator. Word dictionaries and languagemodels identify words that possess a linguistic value (i.e. higher than“0” value, say “1” for example). All words defined in a language possessa value with the exception of words that are not generally accepted(i.e. are regarded as not to exist) and which are of zero or nolinguistic value. In yet another exemplary implementation, suchdictionaries and language models may combine two or more spokenlanguages. By means of example, rules may be applied to remove words(e.g. “ehm”, “gr”, “mmm”, “ergutaretmd”, etc.) that are not found instandard or customized monolingual or multilingual word dictionaries orare not recognized by language models. Rules may also remove symbolssuch as emoticons, tags or code snippets, and they will replaceabbreviated word forms or symbols with full (proper) word forms (e.g.“plz” “please”, “&” “and”).

Corpus preprocessing 110 may also involve the use of off-the-shelf(pre-trained) models to identify sentence boundaries (sentencesegmentation) and perform co-reference resolution (e.g. in a sentencesuch as “I wanna know my balance and pay it”, the system should identifythat the word “it” refers to “balance”). These tasks are critical forstructuring user utterances using the sentence(s) they may contain, andfor identifying links between entities across sentences. The latter areessential for understanding the content of the expressed request(s).

The output of the pre-processing step 110 is one or more sentencescontaining sequences of tokens (words) including punctuation, startingat a Sentence Start (SS) and ending at a Sentence End (SE) (controlcharacters), lacking any ASR tags, emoticons, and abbreviations. Forabove example, the output is the following (“S” stands for “Sentence”):

-   S-1: [SS] hey, my name is daniel howard [SE]-   S-2: [SS] need help please [SE]-   S-3: [SS] want to know my last balance and [SE]-   S-4: [SS] can you guys help me pay my bill online please [SE]

The clean sentences are fed to a syntactic processing module 120 (i.e.some kind of syntactic parser using a syntactic model 121), whose outputis a representation of the grammatical structure of the sentence,including the grammatical properties of tokens and binary relationsbetween them. In an exemplary implementation, a dependency parsing modelmay be used. Dependency parsing is a syntactic parsing paradigmrepresenting sentence structure in terms of binary relations between“heads” and “dependents” (i.e. words that modify “heads”). Each token isidentified on the basis of (i) a part-of-speech (POS) tag, (ii) the“head” token, on which it depends, and (iii) a tag describing the typeof dependency between the two tokens. A dependency parser provides anapproximation of the semantic (meaning) dependencies in a sentence.

An example of dependency parsing is shown below (“S” stands for“Sentence”):

-   S-1: [‘hey’, ‘my’, ‘name’, ‘is’, ‘daniel’, ‘howard’]-   POS [‘UH’, ‘PRP$’, ‘NN’, ‘VBZ’, ‘NNP’, ‘NNP’]-   HEAD [‘is’, ‘name’, ‘is’, ‘is’, ‘howard’, ‘is’]-   DEPENDENCY [‘intj’, ‘poss’, ‘nsubj’, ‘ROOT’, ‘compound’, ‘attr’]-   S-2: [‘need’, ‘help’, ‘please’]-   POS [‘VBP’, ‘NN’, ‘VB’]-   HEAD [‘need’, ‘need’, ‘need’]-   DEPENDENCY [‘ROOT’, ‘dobj’, ‘intj’]-   S-3: [‘want’, ‘to’, know, ‘my’, ‘last’, ‘balance’, ‘and’]-   POS [‘VBP’, ‘TO’, ‘VB’, ‘PRP$’, ‘JJ’, ‘NN’, ‘CC’]-   HEAD [‘want’, ‘know’, want, ‘balance’, ‘balance’, ‘know’, ‘want’]-   DEPENDENCY [‘ROOT’, ‘aux’, ‘xcomp’, ‘poss’, ‘amod’, ‘dobj’, ‘cc’]-   S-4: [‘can’, ‘you’, ‘guys’, ‘help’, ‘me’, ‘pay’, ‘my’, ‘online’,    ‘please’]-   POS [‘MD’, ‘PRP’, ‘NNS’, ‘VB’, ‘PRP’, ‘VB’, ‘PRP$’, ‘NN’, ‘RB’,    ‘VB’] HEAD [‘help’, ‘guys’, ‘help’, ‘help’, ‘pay’, ‘help’, ‘pay’,    ‘pay’, ‘pay’]-   DEPENDENCY [‘aux’, ‘nmod’, ‘nsubj’, ‘ROOT’, ‘nsubj’, ‘ccomp’,    ‘poss’, ‘dobj’, ‘advmod’, ‘intj’]

FIGS. 2A and 2B show an example graphical representations of thedependency parsing output. In particular, FIG. 2A shows the graphicalrepresentation of the syntactic structure for “want to know my lastbalance and” 190, and FIG. 2B shows a graphical representation of thesyntactic structure for “pay my bill online” 195.

In sentence 190, the verb “know” depends on verb “want” (i.e. “want” isthe head of “know”), while the particle “to” and the noun phrase “mylast balance” both depend on the verb “know”.

In sentence 195, the noun phrase “my bill” and the adverbial “online”depend on the verb “pay”.

An output of the syntactic (dependency) parser 120 is used as input to asemantic module, which first aims to prune each sentence by selecting aset of keywords 125, assumed to convey its core meaning. Step 125includes a reduction operation on the length of the sentence. Keywordselection is based on a semantic model 126 that prioritizes a subset ofthe grammatical relations (dependencies) returned by the syntacticparser in the previous step 120 as semantically relevant for intentinduction. For example, selecting direct objects (dobj) of verbs,adjectival modifiers (amod) of nouns, and adverbial modifiers (advmod)of verbs results in pruning the sentences in the previous paragraph,i.e., reducing them to the following lists of keywords:

-   S-2: help-   S-3: know, last, balance-   S-4: help, pay, bill, online

For each binary relation (dobj, amod, advmod, etc.), the model specifieswhether one or both tokens should be added to the list of keywords. Forinstance, in the verb phrase “know my last balance”, “know” and“balance” are both selected keywords. In an exemplary implementation,post-processing of the selected keywords may eventually result in aneven shorter list. For example, in “need help” the model gives priorityto the noun (“help”) removing the verb (“need”) from the words withmeaning of potential interest. The verb “need” is removed during keywordpost-processing, in that it belongs to a finite set of pseudo-modalverbs in English. This kind of knowledge may be added to the model torestrict the list of selected keywords.

While extracting keywords from a sentence, the model keeps track of thedependencies between them. Therefore, a more accurate representation ofthe above lists of keywords is the following. In square brackets, weshow the dependencies between keywords.

-   S-2: [-, help],-   S-3: [know, balance], [last, balance]-   S-4: [help, pay], [pay, bill], [pay, online]    (“-” indicates an empty dependency, i.e. the token (“need”) on which    there is a dependency has been discarded from selected keywords).

The semantic module subsequently maps 128 each one of the extractedkeywords to one of three coarse-grained semantic categories: Actions,Modifiers, and Objects. On the basis of their POS tags and the POS tagsof their heads, keywords tagged with the selected grammatical relationsare identified as instances of one of these keyword types. Hence thelists of extracted keywords can be coded as lists of keyword types andthe binary relations between keywords can be projected to binaryrelations between keyword types, as shown below.

-   S-2: [-, Action],-   S-3: [Action, Object], [Modifier, Object]-   S-4: [Action, Action], [Action, Object], [Action, Modifier]

Finally, the semantic module is responsible for building a structuredrepresentation for each sentence based on the set of binary relationsattested between keyword types in the sentence. The binary relationsbetween keyword types are combined into Triplets of Actions, Modifiersand Objects (AMO) 129. The AMO Triplets are the building blocks of theintent descriptions (templates) that will be created. Each AMO tripletcorresponds to at least one user intent. By means of example, “know”,“last”, and “balance” form an instance of an AMO Triplet, which in turncorresponds to the Account_Balance_Inquiry intent.

For each sentence, AMO Triplets are populated with the extractedkeywords. Not all keyword types in an AMO triplet need to be populated.In a fully populated Triplet, one of the keywords has a relation to bothother keywords. E.g. “balance” has a relation to both “know” and “last”;“pay” has a relation to both “bill” and “online”.

Keywords of the same type are represented in separate AMO Triplets,regardless of whether they are connected with a binary relation or not.Therefore, the number of AMO Triplets for a sentence equals the maximumnumber of any one of the keyword types attested in the sentence. Forexample, if one (1) Modifier, two (2) Objects, and three (3) Actions areattested in a given sentence, the model will build three (3) AMOTriplets.

Table 1a-c contains the six binary relations in step 125 that arecombined in the following three AMO Triplets in step 128. Each AMOTriplet in step 128 is identified in terms of the order in which itappears in the sentence.

TABLE 1.a AMO triplet representing S1. 1 2 A help M O

TABLE 1.b AMO triplet representing S2. 1 2 A know M last O balance

TABLE 1.c AMO triplet representing S3. 1 2 A help pay M online O bill

The next step continues by using Vector Space Models to processstructured corpus representations.

The semantic module described above converts unstructured corpussentences to structured sets of ordered AMO Triplets by first keeping ordiscarding tokens from the utterance on the basis of semantic relevance,then mapping the selected tokens (keywords) to semantic types (keywordtypes), and combining keyword types to semantic (AMO) structures. Theentire corpus is converted to a chart of AMO Triplets 129 populated withkeywords in the order in which they appear in corpus sentences. Eachordered AMO Triplet specifies a representation level. Thus “help”,“know”, and “help” in Tables 1.a-1.c (respectively) populate Actions ofthe first representation level, while “pay” populates Actions of thesecond representation level.

AMO Triplets for individual corpus sentences (Tables 1.a-1.c) are mergedin a single corpus representation (Table 2). Blanks represent the caseswhere no keyword and corresponding relation was found.

TABLE 2 Merging AMO Triplets to represent entire corpus Level 1 Level 2S1 S2 S3 S1 S2 S3 Actions help know help pay Modifiers last onlineObjects balance bill

The keywords are then projected to an n-dimensional embeddings space,i.e. are turned in to n-dimensional vectors 130, i.e. word embeddingsrepresentations, using an embeddings model 131. Word embeddings are abaseline technique for adding pretrained lexical semantic knowledge toNLP applications. That is, keywords are projected to a semantic space(the embeddings space) and are mapped to vectors of real numbersrepresenting distributional properties of words in large language data.Word vectors, in effect, quantify lexical meaning in terms of thelinguistic contexts in which words appear. In an exemplaryimplementations, off-the-shelf pre-trained vector space models are used,including (but not limited to) word2vec (e.g., trained on Google News),Glove vectors (e.g., trained on Wikipedia and Common Crawl), subwordfastText vectors (e.g., trained on Wikipedia and Common Crawl), andsense2vec vectors (e.g., trained on reddit). In yet another exemplaryimplementation, in-house models may be re-trained and tuned to thecorpus data available for a specific domain or use case (e.g. bankingdata acquired from the call center of a specific financial institution).All the above steps are speaker agnostic and thereby do not alter theusual speaker routine during interaction with a human agent or acomputer system (i.e., the speaker does not have to use predefined keysentences or words during his interaction and he does not have to trainthe system by reading a pre-defined training text or by another mean).

In yet another exemplary implementation, after projection to theembeddings space 130, dimensionality reduction is performed 135 on wordvectors, using some state-of-the-art algorithm such as PrincipalComponent Analysis—explained variance. Dimensionality reduction isperformed on the sets of vectors that populate Actions, Modifiers andObjects in each of the sets of ordered AMO Triplets, i.e. separately forActions, Modifiers and Objects at each AMO level. This entails that thesame token does not necessarily have the same vector representationacross all AMO Triplets in which it may appear. For example, the word‘balance’ will have a different representation within the sameembeddings space depending on whether it appears as a verb, a modifieror an object and what is more, also depending on whether it was thefirst/second/third action found within the utterance. Therefore,pre-trained vectors are informed by the semantic types of keywords andtheir occurrences within ordered AMO Triplets.

In an alternative exemplary implementaion, sentence 140 embeddings 160are computed using the word embeddings 135 (or the word embeddings 130in a variation of this exemplary implemenation [not shown in FIG. 1]) ofthe identified keywords in each sentence. A variety of different methodscan be used to compute sentence embeddings 160. In the next paragraphs,two of these methods are described.

In the first method, corpus sentences are represented using aconcatenation of two vectors. The first vector is calculated by maxpooling the n dimensions of the word embeddings of the identifiedkeywords of the sentence. The second vector is calculated as theweighted average of the word embeddings of the identified keywords ofthe sentence, where the weights are calculated using the frequencies ofwords in the English Wiki dump.

In the second method, corpus sentences are represented using theweighted centroids average of the word vectors of the keywords of thesentence. The weights are determined by the type of each of thekeywords, assigning different weights (a first weight) to Actions, (asecond weight to) Modifiers, and (a third weight to) Objects. Combiningthese averages, we end up with the representation of the meaning of eachsentence as a whole.

Unsupervised Clustering of Vector Representations

An unsupervised clustering algorithm is subsequently applied on eitherone of the two types of embeddings, i.e. word 150 or sentence embeddings170, computed above. An optional step of assigning intent labels ornumerals to the keyword embeddings 150 or sentence embeddings 170clusters may be added in alternative exemplary implemenations.Methodology 100 ends by creating or updating an intent library 185 withthe computed intent clusters 150 or 170.

In what follows, we describe two alternative implementations forclustering vectors, and additionally one implementation that combinesthe other two implemenations.

FIG. 3 shows a high-level flowchart of a methodology for calculatingmulti-level cluster relations. In a first exemplary implementation,clustering operates on sentence embeddings 160 or 212 calculated fromkeyword embeddings 130 or 210 for each corpus sentence. Sentenceembeddings 212 are clustered 214, to create clusters of user intent 216(explained below), and the clusters of user intent of the 1^(st)exemplary implementation are outputted 218. The clustered sentences(corresponding to the clusters of step 214) are assumed to trigger thesame intent, yet the semantic components of the intent are implicit.Such clusters can be used for validation of the explicit semanticdescriptions that are produced by the method described in the secondexemplary implementation below, i.e. for validating relations betweenkeyword clusters.

In a second exemplary implementation, a multi-level clusteringmethodology is used to cluster keyword vectors (130 or 135, 210) foreach sentence, using keyword types and the levels of AMO Triplets. Inparticular, a model clusters keyword vectors populating the Action Type,the Modifier Type, and the Object Type, and distinguishes Actions in thefirst AMO Triplet level from Actions in the second AMO Triplet level,and so on. That is, each keyword type and each AMO Triplet levelspecifies a clustering level.

Keyword clusters group together semantically similar (i.e. nearsynonymous, or found within similar/synonymous contexts) words. Forinstance, tokens such as “know”, “ask”, and “learn” may be groupedtogether in a cluster that captures an “inquiry” meaning. Such clustersare equivalent to sets of word dictionaries mapping words to distinctmeanings (i.e. word senses).

Assume that our corpus includes the following sentences:

-   1. pay my bill online-   2. wanna know how to pay my bill-   3. wanna know how I can cancel my bill-   4. pay my bill-   5. need help with paying my bill and getting a receipt-   6. need help about my last balance and about paying my bill online

The selected keywords and their relations are shown below:

-   1. [pay, bill], [pay, online]-   2. [know, pay], [pay, bill]-   3. [know, cancel], [cancel, bill]-   4. [pay, bill]-   5. [help, paying], [paying, bill], [getting, receipt]-   6. [help, balance], [last, balance], [help, pay] [pay, bill], [pay,    online]    These keywords are structured in the ordered AMO Triplets of Table    3:

TABLE 3 Example AMO triplet levels for selected keywords. Level-1Level-2 Level-3 A S1 pay S2 know pay S3 know cancel S4 pay S5 help payget S6 help pay M S1 online S2 S3 S4 S5 S6 last online O S1 bill S2 billS3 bill S4 bill S5 bill receipt S6 balance bill

The clusters of Actions in each of the three clustering levels, i.e. askeywords appear in each of the ordered AMO Triplets in the corpus, arethe following:

TABLE 4 Example Clusters of Actions for the 3 levels of selectedkeywords. Level-1 Level-2 Level-3 A Payment-cluster Payment-clusterReceive-cluster (pay) (pay) (get) Inquiry-cluster Cancellation- (know)cluster Help-cluster (cancel) (help) M Online-cluster Online-cluster(online) (online) Previous-cluster (last) O Bill-cluster Bill-cluster(bill) (bill) Balance-cluster (balance)

The relations between keywords, projected to relations between keywordtypes, are then projected to relations between clusters in keywordtypes. Applying methodology 200 from its start, we begin with thecalculation of keyword embeddings 210 (refer to 130, 135 in FIG. 1). Foreach AMO triplet, of levels 1 . . . n (220 . . . 225), clustering isdone for 220, 225 of reduced dimension keyword embeddings 135 forActions 221, 226, Modifiers 222, 227 and Objects 223, 228, respectively.

These cluster relations 230, 235 are binary relations between wordclusters in various levels of AMO triplets. Such clusters are derivedfrom keyword relations received from the syntactic parser in steps 120.125, 128, 129.

Clusters connected by means of some projected relation are the buildingblocks for constructing intent descriptions. Simple heuristics 251 areused to convert relations between the keyword types in AMO Triplets 230,235 and across AMO Triplets 240 into intent templates 250. For instance,in XML pseudocode, the relation between an Action cluster 221 and anObject cluster 223, with either the Action cluster 221 or the Objectcluster 223 connected with a Modifier cluster 222, or not, is modeledwith the following intent template. Note that entities and relationsmarked with “?” are optional (i.e. not required in the intentdefinition). The intent template below includes an optional Modifier andcaptures two optional dependencies: a Modifier may be dependent on anObject via an “amod” (adjectival modification) relation, or it may bedependent on an Action via an “advmod” (adverbial modification)relation. Obligatory entities (i.e., Action and Object) are connectedvia an obligatory “dobj” (direct object) relation.

<intent name=″11″>  <constraintSet>   <constraint name=″001″>    <entityname=″Action″ value=″ ″/>    <entity name=″Object″ value=″ ″/>   <entity name=″?Modifier″ value=″ ″/>  <relation name=”dobj”argument=”Action,Object“/>  <relation name=”?amod”argument=”Modifier,Object“/>  <relation name=”?advmod”argument=”Modifier, Action/>   </constraint>  </constraintSet> </intent>

In the intent description above, constraint “001” requires the existenceof an Action and an Object, and allows for the presence of a Modifier,without requiring it. Note that an intent may be described with morethan one constraint such as the above. Alternative representations ofintent descriptions may be used instead of the above exemplary intenttemplate.

Slot filling 260 in the intents is implemented by filling emptypositions in the intent templates using tokens in keyword clusters (i.e.dictionaries updated with the output of clustering steps 221, 222, 223,. . . , 226, 227, 228) and the resulting intents are outputted 270. Aslot filling model automatically generates a list of intents with slotsfilled from the dictionaries. For example, the following intent capturesa “request of bill payment” by means of two intent constraints depictingpossible entity configurations. Note that in another exemplaryembodiment, tokens filling the entity slots may be lemmatized and slotsmay eventually be filled with all possible forms of the correspondinglemmas.

<intent name=″112″>  <constraintSet>     <constraint name=″001″>     <entity name=″Action″ value=″pay″/>      <entity name=″Object″value=″bill,invoice,statement″/>    <relation name=”dobj”argument=”Action,Object/>     </constraint>   <constraint name=″002″>     <entity name=″Action″ value=″pay″/>      <entity name=″Object″value=″bill,invoice,statement″/>    <relation name=”pobj”argument=”Action,Object/>     </constraint>  </constraintSet> </intent>

Two other intents are exemplified below: one intent captures “incorrectbill payment” (“124”) and the other intent captures “request for paymentreversal” (“312”).

<intent name=″124″>  <constraintSet>      <constraint name=″001″>       <entity name=″Action″ value=″pay,payoff″/>        <entityname=″Object″ value=″bill,invoice,statement″/>        <entityname=″Modifier″ value=″wrong,incorrect″/>    <relation name=”amod”argument=″Modifier,Object/>      </constraint>  </constraintSet></intent> <intent name=″312″>  <constraintSet>       <constraintname=″001″>         <entity name=″Action″ value=″cancel,reverse″/>        <entity name=″Action″ value=″pay,payoff″/>     <relationname=”dobj” argument=”Action,Action/>       </constraint>   <constraintname=″002″>         <entity name=″Action″ value=″cancel,reverse″/>        <entity name=″Action″ value=″pay,payoff″/>     <relationname=”pobj” argument=”Action,Action/>       </constraint> </constraintSet> </intent>

Types of cluster relations may be grouped together on the basis of thekeyword types and the word clusters they connect. For example, if twotypes of relations hold between the same keyword types (e.g. betweenActions and Objects), instantiated by the same word clusters (e.g.Payment and Statement), then these cluster relations can be merged intoa single relation.

-   -   dobj(ACTION, OBJECT)    -   pobj (ACTION, OBJECT)    -   dobj(Payment, Statement)    -   pobj(Payment, Statement)

This relation is converted into the intent “Payment.Bill” and capturesutterances like:

-   -   I want to pay my bill.    -   I want to make a payment on my last statement.        where “pobj” marks a prepositional object, i.e. an argument of        (dependent on) a verb or noun introduced with a preposition.

As shown above, intent descriptions in the above second exemplaryimplementation are associated with explicit semantic components (i.e.keyword type slots and tokens that may fill them) and corpus sentencesin which the relations between these components are attested.

Each one of the sentences 125 is associated with an intent in the intentlibrary induced from relations between keyword clusters 221-223, 226-228may additionally be associated with a sentence cluster 170, 214.

In a third exemplary implementation, intent utterances acquired from thefirst exemplary implementation 218 are used to validate the intentsinduced from the second implementation 270, and vice versa. For example,if the sentences mapped to a certain intent induced from the secondimplementation 270 are mapped to a single sentence cluster from thefirst implementation 218, we may validate 280 the semantic structure ofthe intent from the second implementation 270 and output intent 290. Ifintent sentences are mapped to more than one sentence clusters 214, thenthe sentence clusters 214, depending on the weights on the basis ofwhich they have been computed, may indicate more coarse-grained intents,or they may be used to validate inheritance relations between intents.

The use of the third exemplary implementation may increase the accuracyand performance of the calculation method resulting in improved intentlibraries.

In a first example, the sentences below are in the same sentence clusterthat was computed on weighted Objects. They correspond to distinctintents in the second implementation, which may however be considered tobe consolidated:

-   card was lost-   card was stolen

In a second example, the sentences below are in the same sentencecluster that was computed on weighted Objects. They correspond todistinct intents in the second implementation, which should be connectedwith appropriate intent inheritance relations capturing the fact thatthey all refer to requests related to “account balance”:

-   find account balance-   transfer account balance-   wrong account balance-   new account balance

Intent inheritance and intent relations can be created manually orautomatically using simple rules. Strictly speaking, they are outsidethe scope of the invention.

FIG. 4 shows an example extract of clusters for the United Statesbanking domain for 100,000 utterances not readily associated with anexisting user intent. The “y” axis shows the number of utterances percluster while the “x” axis shows the cluster number (i.e. a randomlyassigned identification number). Clusters 271 are shown in differentshades with the number of utterances they contain marked as a percentage272 of the total number of 100,000 utterances. For better visibilityonly the percentages 272 for a subset of the clusters 271 are shown.

Below the 2-dimensional graph are listed an example subset 273 of theabove clusters together with labels of intent.

FIG. 5 shows an example set of AMO clusters and the application of anAction Filter. In this example a corpus is filtered with the Action1cluster 312 with 18.8% 313 appearance frequency (labeled “pay”—notvisible due to space restrictions). The Modifier 1 322 and Object 1 332clusters that are related to Action 1 312 cluster are then viewed. AModifier1 cluster labeled “incorrect” 323 (the label is not shown) isthen selected. Subsequently, three Object clusters 333, 334, 335 relatedto Modifier 1 323 cluster are selected, while most frequently the Objectcluster labeled “bill” is selected. Utterances 343 that exemplify thechosen filters are shown below the clusters: one can distinguish “paidthe wrong bill”, “paid to the wrong account”, “paid wrong mastercard”.On the second clustering level, further filtering of the data byselecting an Action2 314, Modifier 2 324 or Object 2 334 cluster andreceiving the corresponding relations attested in the corpus is done.Notice, for example, that there is an Action 2 314 cluster labeled“cancel” 315 and an Object 2 324 cluster labeled “emt” (i.e., emailmoney transfer) (the label is not shown) 336: these could potentially becombined with the first clustering level relations in utterances such as“paid wrong mastercard and want to cancel the emt”—to be confirmed byfiltering the data. The AMO triplet is completed with Action 3 316,Modifier 3 326, and Object3 337. The remaining clusters in FIG. 5 areshown for a more complete understanding but are not labeled orenumerated for visual simplicity of the figure.

Intent Induction for Taking Actions in Computer Systems and Applications

FIG. 6 shows how an intent library interfaces with external systems forintent induction and with systems that perform an action based on theinduced intent. Methodology 400 starts with an utterance entered 405(i.e. spoken or typed) by the user. One of the intents in a library 411constructed with methodology 200 is assigned 410 to utterance 405. Thetask of assigning the correct intent to an unseen utterance is aclassification task addressing the similarity of the new utterance toutterances associated with individual intents stored in intent library411. This task may be tackled via standard machine learning algorithms,known to those of ordinary skill, for classification or using thesemantic descriptions of intents in the created intent library 411 (i.e.the collection of all intents induced with the present innovativesolution and the intents already populating the same intent libraryprior to the addition of the new intents induced by the presentinnovative solution). The latter entails that whenever keywords fillingthe keyword type and the relation slots in an intent description aredetected in a particular utterance, this intent will be assigned to theutterance.

Development of or interfacing with a specific application involvesmapping 420 each intent in library 411 to a specific action usingmapping rules 421. When an intent is mapped to an action 430, thecorresponding action is performed 450 by the connected external systemor application (not shown). If the mapping of the intent to an action isnot successful 430 for whatever reason (e.g. no intent can be associatedwith an action using mapping rules 421, or incomplete, broken, or emptyrules 421 are supplied to step 420, or other), then no action isperformed.

FIG. 7 shows a high-level hardware diagram for an intent induction andaction system 500. Intent induction system 501 is made up of an ASRmodule 510, a preprocessor module 520, a syntactic parser (e.g.dependency) module 530, a semantics analyzer module 540, an embeddingsprocessor module 550, a clustering processor module 560, and an intentcalculator module 570.

Intent induction system 501 has the goal of creating intent libraries.System 501 achieves its goal by processing a very large corpus (orcorpuses) offline and induces a number of intents, utterances thatfulfill these intents and precise intent descriptions (i.e. descriptionsof the semantic components of the intents).

ASR module 510 is fed with a voice utterance 505 and an acoustic model515. The ASR 510 coverts the input voice into text using acoustic model515. In an alternative use case where the user input is text (e.g. in achat interface), ASR 510 is optional or is not used. The text output ofASR 510, is fed to pre-processor 520, which uses a language model 525 toproduce a clean text without garbage words, emoticons, punctuations,etc. The clean text is fed to syntactic (e.g. dependency) parser 530 toproduce words and dependencies (e.g. binary relations) between them. Theoutput of syntactic parser 530 is fed to semantics analyzer 540, whichin turn uses a semantics model 545 to extract a set of keywords. Thesemantic analyzer 540 also maps keywords to broad semantic categories(i.e. actions, modifiers and objects) and creates ordered (i.e.prioritized) AMO triplets, where each AMO triplet describes an intentand contains at least one keyword. The output of semantics analyzer 540is fed to embeddings processor 550, which uses an embeddings model 555to convert the extracted keywords in the ordered AMO triplets intokeyword embedding vectors and reduces the dimensions of the keywordembedding vectors in each of the action, modifier and object semanticcategory and in each order of the AMO triplets.

The reduced dimension vectors are then fed to clustering processor 560,which creates keyword or sentence clusters. Each keyword clustercontains semantically similar keywords. Clustering processor 560 outputsintent clusters to intent calculator 570. Intent calculator 570 inducescluster relations, creates intent templates, fills slots in the intenttemplates and optionally validates the intent semantic structure. Inalternative exemplary implementations intent calculator 570 alsooptionally assigns labels to intent clusters. Intent calculator 570stores in an intent library the intent clusters and the intent theclusters represent and outputs the induced intent.

Having created or updated the intent library or libraries, they arestored locally, remotely, on the cloud, or at any type of centralized ordistributed storage according to the specific exemplary implementationused. These libraries are then used at run time when a user's intent isinduced from a live (or other) utterance with the help of the contentsof the pre-constructed and stored library or libraries.

At runtime a new utterance is received. The intent induction systemassigns the received intent to one of the intents in the library (orlibraries) of intents (refer to [0083] for more information).

The induced intent is output to action processor 580 which uses mappingrules 585 to map intent onto one or more actions and outputs each action590 for use by one or more external systems. Any action mapped into anintent is performed by a “third party system”.

The modules of system 500 can be combined into new modules eachcontaining two or more of modules of system 500. Alternatively, all orsome module(s) of system 500 may be assigned different tasks ofcombinations of tasks of those previously described, without alteringthe scope of protection of the present innovative solution, as this isobvious to any reader of ordinary skill in related art. Also, any of themodules of system 500 may be implemented in any architecture known inprior art. It is obvious to a reader of ordinary skill in related artthat modules 500 can be implemented in hardware, software, firmware or acombination of the three.

The models, the outputs of each module and actions 590 may beimplemented in any known data format including but not limited toeXtensible, Markup Language (XML), American Standard Code forInformation Interchange (ASCII), or other and may be stored andretrieved from distributed memory, databases, cloud storage or other,while stored at a single storage location or split between storagelocations. They may also be encoded and/or encrypted in any availableformat with any available algorithm implemented in hardware, software,firmware or a combination thereof.

FIG. 8 shows an example implementation of an intent induction system andits connections to enable the flow of data. Systems and devices 503 areinterconnected to provide the necessary flow of data. An applicationserver 598 is used in this exemplary implementation of the intentinduction system. In alternative exemplary embodiments, hardwareimplementations of the intent induction system may be used usingpurpose-built or dedicated software and firmware.

Application server 598 is connected to a database 599 which storesdictionaries, rules and models. Application server 598 is also connectedto an optional cache server or proxy server 597 which communicates viaan optional firewall 596 to the outside using an available data network595. Network 595 may take the form of a wireless or wired network (e.g.Wireless Fidelity (WiFi), cellular, Ethernet, or other) and be part ofany network infrastructure like the Internet, the Cloud, proprietarynetwork or a combination of any of them. Application server 598implements the present innovative solution and communicates with adevice used by a user to (ultimately) access the application server 598.

A user may connect to network 595 via any computing device or system,including laptop 504, desktop computer 593, tablet or mobile phone 592(smartphone or simple device) or similar. Users may also connect viafixed telephones 591, both digital and analogue, connected to a digitaltelephony infrastructure or a Public Switched Telephone Network (PSTN)infrastructure which is then connected to digital data infrastructures.Third party or remote databases 594 may also be connected to network 595and which can be accessed by application server 598 or other dedicatedor specialized hardware used for the implementation of the intentinduction system.

In a variation of the above exemplary implementation of system 503, theuser device 592, 593, 504 is equipped either with special software, orone or more special hardware processors or combination of the specialsoftware and hardware that implement the present innovative solution. Asa result the present innovative solution is implemented at the userdevice 592, 593, 504 without the need to application server 598 and evenwithout the need of cache server 597 and database 599. If database 599is not used, then dictionaries, rules and models are stored in the userdevice 592, 593, 504.

Regardless of which of the above two exemplary implementations are used,the output of the application of the present innovative solution, i.e.intents are stored either in database 599 or at the user device 592,593, 504 to create or update intent dictionaries.

After induction of user intent (with methodologies 100, 200 and theintent dictionaries), the user intent is mapped either at applicationserver 598 or at the user device 592, 593, 504 onto an action to betaken. This action is then sent either by the application server 598 orby the user device 592, 593, 504 to an external server 583. Externalserver 583 may be an application server (e.g. forming part of a bankingsystem, a search engine, a hospital system, etc.) or other type and isconnected to network 595.

Example Hardware Architecture of an Application Server or Other HardwareImplementing the Intent Induction System

FIG. 9A shows the basic hardware architecture of an application serveror other hardware implementing the intent induction system. ApplicationServer or other hardware 600 comprises a microprocessor 610, a memory620, a screen adapter 630, a hard-disk 640, a graphics processor 650, acommunications interface adapter 660, and a UI adapter 670. ApplicationServer 600 may also contain other components which are not shown in FIG.9A or lack some of the components shown in FIG. 9A. Components 630, 640,650, 670 are optional.

FIG. 9B shows a system for intent induction using multiple processors.System 680 has more than one processors; processor_1 683, processor_2686, . . . , processor_n 689. These processors are connected via a bus(not shown) and may function in a first exemplary implementation in apeer-to-peer setup and in a second exemplary implementation as amaster-slave setup where one of the three processors acts as master andthe other processors act as slaves. Processors 683, 686, 689 may beconfigured each to execute one or more modules 500.

The use of processors 683, 686, 689 allows faster operation times forthe intent induction system and allows concurrent use of multiple userswhile allowing easy scale up even at hot operation.

In other exemplary implementations, processor 683, 686, 689 may executemodules 500 in a redundant mode to enable uninterrupted intent inductionsystem operation in the event of hardware failure of any of processors683, 686, 689. The use of processors 683, 686, 689 allows fasteroperation times for the intent induction system.

FIG. 9C shows a system for intent induction using multiple processingcores. System 690 has more than one processors; processor_1 693,processor_2 696, . . . , processor_n 699. These processing cores areconnected via a bus (not shown) and may function in a first exemplaryimplementation in a peer-to-peer setup and in a second exemplaryimplementation as a master-slave setup where one the three processingcores acts as master and the other processors act as slaves. Processingcores 693, 696, 699 may be configured each to execute one or moremodules 500.

The use of processing cores 693, 696, 699 allows faster operation timesfor the intent induction system and allows concurrent use of multipleusers while allowing easy scale up even at hot operation.

In other exemplary implementations, processor 693, 696, 699 may executemodules 500 in a redundant mode to enable uninterrupted intent inductionsystem operation in the event of hardware failure of any of processingcores 693, 696, 699. The use of processing cores 693, 696, 699 allowsfaster operation times for the intent induction system.

In another exemplary implementation, each or some of processors 683,686, 699 have multiple processing cores like 693, 696, 699.

Example Software Components of an Application Server

FIG. 10 shows the basic software components 700 running on anapplication server. They comprise an Operating System (OS) 710,Utilities 720, an Application Server Software 730, at least oneApplication or Web Service 740, and at least one Hardware driver 750.Additional software components may run at the application server whilesome of those shown in FIG. 10 may be omitted. One or more of softwarecomponents 700 may be instantiated more than once to help speed upoperation of the intent induction system and support easy scale up tocater for the needs of several concurrent users.

Example Software Components of a Device

FIG. 11 shows the main Software Components of a device. At the lowestlayer of software components 800 are Device-Specific Capabilities 860that is the device-specific commands for controlling the various devicehardware components. Moving to higher layers lie an OS 850, VirtualMachines 840 (like a Java Virtual Machine or other), Device/User Manager830, Application Manager 820, and at the top layer, Applications 810.These applications may access, manipulate, transform and display dataand communicate with other devices and may use any protocol, standard orproprietary, used by the devices they run on or other devices or systemsthey connect to.

The above exemplary implementations are intended for use either as astandalone system or method in any conceivable scientific and businessdomain, or as part of other scientific and business methods, processesand systems.

The above exemplary implementations descriptions are simplified and donot include hardware and software elements that are used in theimplementations but are not part of the current invention, are notneeded for the understanding of the implementations, and are obvious toany user of ordinary skill in related art. Furthermore, variations ofthe described method, system architecture, and software architecture arepossible, where, for instance, method steps, and hardware and softwareelements may be rearranged, omitted, or new added.

Various implementations of the invention are described above in theDetailed Description. While these descriptions directly describe theabove implementations, it is understood that those skilled in the artmay conceive modifications and/or variations to the specificimplementations shown and described herein unless specifically excluded.Any such modifications or variations that fall within the purview ofthis description are intended to be included therein as well. Unlessspecifically noted, it is the intention of the inventor that the wordsand phrases in the specification and claims be given the ordinary andaccustomed meanings to those of ordinary skill in the applicable art(s).

The foregoing description of a preferred embodiment and best mode of theinvention known to the applicant at this time of filing the applicationhas been presented and is intended for the purposes of illustration anddescription. It is not intended to be exhaustive or limit the inventionto the precise form disclosed and many modifications and variations arepossible in the light of the above teachings. The embodiment was chosenand described in order to best explain the principles of the inventionand its practical application and to enable others skilled in the art tobest utilize the invention in various embodiments and with variousmodifications as are suited to the particular use contemplated.Therefore, it is intended that the invention not be limited to theparticular embodiments disclosed for carrying out this invention, butthat the invention will include all embodiments falling within the scopeof the appended claims.

In one or more exemplary embodiments, the functions described may beimplemented in hardware, software, firmware, or any combination thereof.If implemented in software, the functions may be stored on ortransmitted over as one or more instructions or code on a computerreadable medium. Computer-readable media includes both computer storagemedia and communication media including any medium that facilitatestransfer of a computer program from one place to another. A storagemedia may be any available media that can be accessed by a computer. Byway of example, and not limitation, such computer-readable media cancomprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage,magnetic disk storage or other magnetic storage devices, or any othermedium that can be used to carry or store desired program code in theform of instructions or data structures and that can be accessed by acomputer or any other device or apparatus operating as a computer. Also,any connection is properly termed a computer-readable medium. Forexample, if the software is transmitted from a website, server, or otherremote source using a coaxial cable, fiber optic cable, twisted pair,digital subscriber line (DSL), or wireless technologies such asinfrared, radio, and microwave, then the coaxial cable, fiber opticcable, twisted pair, DSL, or wireless technologies such as infrared,radio, and microwave are included in the definition of medium. Disk anddisc, as used herein, includes compact disc (CD), laser disc, opticaldisc, digital versatile disc (DVD), floppy disk and blu-ray disc wheredisks usually reproduce data magnetically, while discs reproduce dataoptically with lasers. Combinations of the above should also be includedwithin the scope of computer-readable media.

The previous description of the disclosed exemplary embodiments isprovided to enable any person skilled in the art to make or use thepresent invention. Various modifications to these exemplary embodimentswill be readily apparent to those skilled in the art, and the genericprinciples defined herein may be applied to other embodiments withoutdeparting from the spirit or scope of the invention. Thus, the presentinvention is not intended to be limited to the embodiments shown hereinbut is to be accorded the widest scope consistent with the principlesand novel features disclosed herein.

What is claimed is:
 1. A system for updating an intent library, thesystem comprising: a syntactic parser arranged to process a sequence ofword tokens and control characters of at least one sentence in a corpusand produce words and dependencies between the words; a semanticanalyzer arranged to process the words and dependencies between thewords for extracting a set of keywords and arranged to map the set ofkeywords to action (A), modifier (M) and object (O) semantic categoriesand create ordered AMO triplets; an embeddings processor arranged toconvert the set of extracted keywords in the ordered AMO triplets intokeyword embedding vectors; a clustering processor arranged to clusterthe reduced dimension keyword embedding vectors, where each keywordcluster contains semantically similar keywords; and an intent calculatorarranged to calculate cluster relations, and store the intent clustersand the intents the clusters represent to the intent library.
 2. Thesystem of claim 1, further comprising a pre-processor arranged toeliminate words and marks that have no linguistic value from a corpus,and arranged to create the sequence of word tokens and pairs of sentenceboundary control characters, where the corpus comprises at least onesentence.
 3. The system of claim 1, where the intent calculator isfurther arranged to validate an intent semantic structure.
 4. The systemof claim 1, where the intent calculator is further configured to: assignintent labels to intent clusters, which labels are found in the intentlibrary; and store the intent labels to the intent library.
 5. Thesystem of claim 1, where the pre-processor, the syntactic parser, thesemantic analyzer, the embeddings processor, the clustering processor,and the intent calculator are implemented in one of an applicationserver, a user device, a multi-processor system, a multicore-processor,and a multi-processor system where each processor is a multi-coreprocessor.
 6. The system of claim 1, further comprising: an actionprocessor arranged to map each intent onto one or more actions andoutput each actions to at least one external system.
 7. A serverconfigured to cluster keywords comprising: a communications interfacearranged to received text from at least one of an automated speechrecognition (ASR) module and a user interface, the text forming at leastone sentence in a corpus; and a processor arranged to: syntacticallyparse a sequence of word tokens and control characters of the at leastone sentence in the corpus to produce words and dependencies between thewords; semantically analyze the words and dependencies between the wordsfor extracting a set of keywords and map the set of keywords to action(A), modifier (M) and object (O) semantic categories and create orderedAMO triplets; convert the extracted set of keywords in the ordered AMOtriplets into keyword embedding vectors; and cluster the reduceddimension keyword embedding vectors, where each keyword cluster containssemantically similar keywords.
 8. The server of claim 7, wherein theprocessor is further configured to: i) eliminate words and marks thathave no linguistic value from the corpus, and ii) create the sequence ofword tokens and pairs of sentence boundary control characters.
 9. Theserver of claim 8, wherein the processor is further configured tocalculate cluster relations, create intent templates, fill emptypositions in the intent templates, and store the intent clusters and theintents the clusters represent to an intent library.
 10. The server ofclaim 6, wherein the processor is further configured to: assign intentlabels to intent clusters, which labels are found in the intent library;and store the intent labels to the intent library.
 11. A computerimplemented method for updating an intent library, the methodcomprising: pre-processing a corpus to eliminate words and symbols thathave no linguistic value, where the corpus comprises at least onesentence, and to create a sequence of word tokens and pairs of sentenceboundary control characters; syntactically processing the sequence oftokens to produce a grammatical-syntactical representation of the atleast one sentence in the corpus; semantically processing thegrammatical and/or syntactical representation of the at least onesentence in the corpus to extract a set of keywords; mapping eachextracted keyword in the set of extracted keywords to one of action (A),modifier (M) and object (O) semantic category; representing the order ofappearance of the set of extracted keywords as different levels ofactions(A), modifiers(M) and objects(O); calculating binary relationsbetween the set of extracted keywords; combining and prioritizing thebinary relations into ordered AMO triplets, where each AMO triplet ofthe ordered AMO triplets describes one intent and contains at least onekeyword; converting the extracted keywords in the ordered AMO tripletsinto keyword embedding vectors; mapping the set of extracted keywords inthe ordered AMO triplets onto an embedding space; clustering the keywordembedding vectors; creating cluster combinations; and entering thecluster combinations into the intent library.
 12. The method of claim11, where the mapping depends on the keyword category and order ofappearance in the sentence.
 13. The method of claim 11, furthercomprising: calculating sentence embedding vectors from the sentence'skeyword embedding vectors, and reduced dimension keyword embeddingvectors; clustering the sentence embedding vectors, where each sentencecluster contains semantically similar sentences, which sentences in acluster express a single intent; and entering the sentence clusters intothe intent library.
 14. The method of claim 13, where each sentenceembedding vector is calculated by concatenating: a first vector which isthe max pooling of the keyword embeddings of the sentence, and a secondvector calculated as the weighted average of the keyword embeddingvectors of the sentence.
 15. The method of claim 14, where the weightsin the second vector are calculated using the frequencies of words in anEnglish Wiki dump.
 16. The method of claim 13, where each sentenceembedding vector is calculated as the weighted centroids average of thekeyword embedding vectors of the sentence.
 17. The method of claim 16,where the weights applied to the centroids of the keyword embeddingvectors are each selected so that: a first weight is applied to keywordsmapped onto the action semantic category; a second weight is applied tokeywords mapped onto the modifier semantic category; and a third weightis applied to keywords mapped onto the object semantic category.
 18. Themethod of claim 13, further comprising using the sentence clusters tovalidate the keyword clusters.
 19. The method of claim 11 furthercomprising: grouping cluster relations using the keyword types and thekeyword clusters they connect; converting the cluster relations intointents; using heuristics to convert relations between keyword types inAMO triplets and across AMO Triplets into intent templates; and fillingempty positions in the intent templates with word tokens from intentdictionaries.
 20. The method of claim 11, further comprising mapping theintent onto an action.
 21. A non-transitory computer program productthat causes a system to update an intent library, the non-transitorycomputer program product having instructions to: pre-process a corpus toeliminate words and marks that have no linguistic value, where thecorpus comprises at least one sentence, and to create a sequence of wordtokens and pairs of sentence boundary control characters; syntacticallyprocess the sequence of word tokens and pairs to produce a grammaticalor syntactic representation of the at least one sentence in the corpus;semantically process the grammatical or syntactic representation of theat least one sentence in the corpus to extract a set of keywords; mapeach of the extracted keywords to one of action (A), modifier (M) andobject (O) semantic category; represent the order of appearance of theextracted set of keywords as different levels of actions(A),modifiers(M) and objects(O); calculate binary relations between theextracted keywords; combine and prioritize the binary relations intoordered AMO triplets, where each AMO triplet describes one an intent andcontains at least one keyword; convert the extracted set of keywords inthe ordered AMO triplets into keyword embedding vectors; map the set ofextracted keywords in the ordered AMO triplets onto an embedding space;cluster the keyword embedding vectors; create cluster combinations; andenter the cluster combinations into the intent library.
 22. Thenon-transitory computer program product of claim 21, where the mappingdepends on the keyword category and order of appearance in the sentence.23. The non-transitory computer program product of claim 21, furthercomprising instructions to: calculate sentence embedding vectors fromthe sentence's keyword embedding vectors, and reduced dimension keywordembedding vectors by one of (i) concatenating a first vector calculatedby concatenating the keyword embeddings of the sentence, and a secondvector calculated as the weighted average of the keyword embeddingvectors of the sentence, where the weights in the second vector arecalculated using the frequencies of words in the English Wild dump, and(ii) from the weighted centroids average of the keyword embeddingvectors of the sentence, where the weights applied to the centroids ofthe keyword embedding vectors are each selected so that (a) a firstweight is applied to keywords mapped onto the action semantic category,(b) a second weight is applied to keywords mapped onto the modifiersemantic category, and (c) a third weight is applied to keywords mappedonto the object semantic category; cluster the sentence embeddingvectors, where each sentence cluster contains semantically similarsentences, which sentences in a cluster express a single intent; andenter the sentence clusters into the intent library.
 24. Thenon-transitory computer program product of claim 23, further comprisinginstructions to use the sentence clusters to validate the keywordclusters.
 25. The non-transitory computer program product of claim 21further comprising instructions to: group cluster relations using thekeyword types and the keyword clusters the cluster relations connect;convert cluster relations into intents; use heuristics to convertrelations between keyword types in AMO triplets and across AMO Tripletsinto intent templates; and fill empty positions in the intent templateswith word tokens from intent dictionaries.